Lightweight LCP-Array Construction in Linear Time

نویسندگان

  • Simon Gog
  • Enno Ohlebusch
چکیده

The suffix tree is a very important data structure in string processing, but it suffers from a huge space consumption. In large-scale applications, compressed suffix trees (CSTs) are therefore used instead. A CST consists of three (compressed) components: the suffix array, the LCP-array, and data structures for simulating navigational operations on the suffix tree. The LCP-array stores the lengths of the longest common prefixes of lexicographically adjacent suffixes, and it can be computed in linear time. In this paper, we present new LCP-array construction algorithms that are fast and very space efficient. In practice, our algorithms outperform the currently best algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast and Lightweight LCP-Array Construction Algorithms

The suffix tree is a very important data structure in string processing, but it suffers from a huge space consumption. In large-scale applications, compressed suffix trees (CSTs) are therefore used instead. A CST consists of three (compressed) components: the suffix array, the LCP-array, and data structures for simulating navigational operations on the suffix tree. The LCP-array stores the leng...

متن کامل

Inducing the LCP-Array

We show how to modify the linear-time construction algorithm for suffix arrays based on induced sorting (Nong et al., DCC’09) such that it computes the array of longest common prefixes (LCP-array) as well. Practical tests show that this outperforms recent LCP-array construction algorithms (Gog and Ohlebusch, ALENEX’11).

متن کامل

Two Space Saving Tricks for Linear Time LCP Array Computation

In this paper we consider the linear time algorithm of Kasai et al. [6] for the computation of the Longest Common Prefix (LCP) array given the text and the suffix array. We show that this algorithm can be implemented without any auxiliary array in addition to the ones required for the input (the text and the suffix array) and the output (the LCP array). Thus, for a text of length n, we reduce t...

متن کامل

Critique "Lightweight LCP Construction for Next-Generation Sequencing Datasets"

The paper presents the rst lightweight method that simultaneously computes, the longest common pre x array(LCP) and BWT of very large collections of sequences. Knowing the LCP of DNA sequences collection would facilitate the rapid computation of maximal exact matches, shortest unique substrings and shortest absent words. CPU-e cient algorithms for computing the LCP of a string have been describ...

متن کامل

Advanced topics in algorithms

Lowest common ancestor algorithms are in [12, 19, 2]. Algorithms to construct suffix trees in linear time are in [22, 18, 21, 5]. Suffix arrays were introduced in [17]. The linear time construction algorithm for suffix arrays is from [14]. The simple construction of the LCP array from the suffix array is from [15]. The k-mismatch problem is discussed in [7, 16, 1]. The FM index is from [6]. Som...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1012.4263  شماره 

صفحات  -

تاریخ انتشار 2010